Remote I/O Optimization and Evaluation for Tertiary Storage Systems through Storage Resource Broker
نویسندگان
چکیده
Large-scale parallel scientific applications are generating huge amounts of data that tertiary storage systems emerge as a popular place to hold them. SRB, a uniform interface to various storage systems including tertiary storage systems such as HPSS, UniTree etc., becomes an important and convenient way to access tertiary data across networks in a distributed environment. But SRB is not optimized for parallel data access: one SRB I/O call to storage systems must access a contiguous piece of data just like UNIX I/O. For many access patterns, this results in numerous small I/O calls which are very expensive. In this paper, we present a run-time library (SRB-OL) for optimizing tertiary storage access on top of SRB low level I/O functions. SRB-OL extends various state-of-the-art I/O optimizations that could be found in secondary storage systems to a remote data access environment via SRB. We also present a novel optimization scheme: superfile that can deal with large amounts of small files efficiently. We also incorporate a subfile technique and other features in SRB such as container, migrate, stage and purge into our SRB-OL. How to use these optimizations is decided by a Meta-data Management System (MDMS) [7] that resides one level above SRB-OL. The user provides access pattern information/hints through user application to MDMS, and then MDMS uses these hints to choose an optimal I/O approach and passes the decision to SRB-OL. Finally, SRB-OL performs optimized SRB I/O calls to access data residing on tertiary storage systems. To give a quantitative view of optimized SRB I/O functions, we propose a performance model based on significant I/O experiments. By using this performance model, we can prove that collective I/O, superfile etc have significant performance improvements. In addition, we present an I/O Performance Predictor that can estimate I/O cost before the user actually carries out her experiment. This provides the user a lot of benefits for running her application.
منابع مشابه
I/O Optimization and Evaluation for Tertiary Storage Systems
Large-scale parallel scientific applications are generating huge amounts of data that tertiary storage systems emerge as a popular place to hold them. SRB, a uniform interface to various storage systems including tertiary storage systems such as HPSS, UniTree etc., becomes an important and convenient way to access tertiary data across networks in a distributed environment. But SRB is not optimi...
متن کاملEecient Buuering for Concurrent Disk and Tape I/o
Tertiary storage is becoming increasingly important for many organizations involved in large-scale data analysis and data mining activities. Yet database management systems (DBMS) and other data-intensive systems do not incorporate tertiary storage as a rst-class citizen in the storage hierarchy. For instance, the typical solution for bringing tertiary-resident data under the control of a DBMS ...
متن کاملA High-Performance Cluster Storage Server
An essential building block for any Data Grid infrastructure is the storage server. In this paper we describe a high-performance cluster storage server built around the SDSC Storage Resource Broker (SRB) and commodity workstations. A number of performance critical design issues and our solutions to them are described. We incorporate pipeline optimizations into SRB to enable the full overlapping...
متن کاملA Simple Mass Storage System for the SRB Data Grid
The functionality that is provided by Mass Storage Systems can be implemented using data grid technology. Data grids already provide many of the required features, including a logical name space and a storage repository abstraction. We demonstrate how management of tape resources can be integrated into data grids. The resulting infrastructure has the ability to manage archival storage of digita...
متن کاملSecure and Efficient Client and Server Side Data Deduplication to Reduce Storage in Remote Cloud Computing Systems
Duplication of data in storage systems is becoming increasingly common problem. The system introduces I/O Deduplication, a storage optimization that utilizes content similarity for improving I/O performance by eliminating I/O operations and reducing the mechanical delays during I/O operations and shares data with existing users if Deduplication found on the client or server side. I/O Deduplicat...
متن کامل